doi: 10.17586/2226-1494-2024-24-2-241-248


ViSL One-shot: generating Vietnamese sign language data set

K. Dang, I. A. Bessmertny


Read the full article  ';
Article in English

For citation:
Dang Khanh, Bessmertny I.A. ViSL One-shot: generating Vietnamese sign language data set. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 241–248. doi: 10.17586/2226-1494-2024-24-2-241-248


Abstract
The development of methods for automatic recognition of objects in a video stream, in particular, recognition of sign language, requires large amounts of video data for training. An established method of data enrichment for machine learning is distortion and noise. The difference between linguistic gestures and other gestures is that small changes in posture can radically change the meaning of a gesture. This imposes specific requirements for data variability. The novelty of the method lies in the fact that instead of distorting frames using affine image transformations, vectorization of the sign language speaker’s pose is used, followed by noise in the form of random deviations of skeletal elements. To implement controlled gesture variability using the MediaPipe library, we convert to a vector format where each vector corresponds to a skeletal element. After this, the image of the figure is restored from the vector representation. The advantage of this method is the possibility of controlled distortion of gestures, corresponding to real deviations in the postures of the sign language speaker. The developed method for enriching video data was tested on a set of 60 words of Indian Sign Language (common to all languages and dialects common in India), represented by 782 video fragments. For each word, the most representative gesture was selected and 100 variations were generated. The remaining, less representative gestures were used as test data. The resulting word-level classification and recognition model using the GRU-LSTM neural network has an accuracy above 95 %. The method tested in this way was transferred to a corpus of 4364 videos in Vietnamese Sign Language for all three regions of Northern, Central and Southern Vietnam. Generated 436,400 data samples, of which 100 data samples represent the meaning of words that can be used to develop and improve Vietnamese sign language recognition methods by generating many variations of gestures with varying degrees of deviation from the standards. The disadvantage of the proposed method is that the accuracy depends on the error of the MediaPipe library. The created video dataset can also be used for automatic sign language translation.

Keywords: Vietnamese sign language, Indian Sign Language, sign language recognition, MediaPipe, coordinate transformation, vector space, random noise, GRU-LSTM, one-shots, data augmentation

References
  1. Li D., Yu X., Xu C., Petersson L., Li H. Transferring Cross-domain Knowledge for Video Sign Language Recognition. Proc. of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6204–6213. https://doi.org/10.1109/cvpr42600.2020.00624
  2. Li D., Opazo C.R., Yu X., Li H. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. Proc. of the IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1448–1458. https://doi.org/10.1109/WACV45572.2020.9093512
  3. Camgoz N.C., Hadfield S., Koller O., Ney H., Bowden R. Neural sign language translation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7784–7793. https://doi.org/10.1109/CVPR.2018.00812
  4. Sridhar A., Ganesan R.G., Kumar P., Khapra M. INCLUDE: A large scale dataset for indian sign language recognition. Proc. of the 28th ACM International Conference on Multimedia, 2020, pp. 1366–1375. https://doi.org/10.1145/3394171.3413528
  5. Ying X. An overview of overfitting and its solutions. Journal of Physics: Conference Series, 2019, vol. 1168, no. 2, pp. 022022. https://doi.org/10.1088/1742-6596/1168/2/022022
  6. Creswell A., White T., Dumoulin V., Arulkumaran K., Sengupta B., Bharath A. Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 2018, vol. 35, no. 1, pp. 53–65. https://doi.org/10.1109/MSP.2017.2765202
  7. Gupta K., Singh S., Shrivastava A. PatchVAE: Learning local latent codes for recognition. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4745–4754. https://doi.org/10.1109/CVPR42600.2020.00480
  8. Karras T., Aila T., Laine S., Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation. Proc. of the ICLR 2018 Conference Blind Submission, 2018.
  9. Ma L., Jia X., Sun Q., Schiele B., Tuytelaars T., Van Gool L. Pose guided person image generation. Proc. of the 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017.
  10. Sushko V., Gall J., Khoreva A. One-shot GAN: Learning to generate samples from single images and videos. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021, pp. 2596–2600. https://doi.org/10.1109/CVPRW53098.2021.00293
  11. Li J., Jing M., Lu K., Ding Z., Zhu L., Huang Z. Leveraging the invariant side of generative zero-shot learning. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7394–7403. https://doi.org/10.1109/CVPR.2019.00758
  12. Madrid G.K.R., Villanueva R.G.R., Caya M.V.C. Recognition of dynamic Filipino Sign language using MediaPipe and long short-term memory. Proc. of the 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2022. https://doi.org/10.1109/ICCCNT54827.2022.9984599
  13. Adhikary S., Talukdar A.K., Sarma K.K. A vision-based system for recognition of words used in Indian Sign Language using MediaPipe. Proc. of the 2021 Sixth International Conference on Image Information Processing (ICIIP), 2021, pp. 390–394. https://doi.org/10.1109/ICIIP53038.2021.9702551
  14. Zhang S., Chen W., Chen C., Liu Y. Human deep squat detection method based on MediaPipe combined with Yolov5 network. Proc. of the 2022 41st Chinese Control Conference (CCC), 2022, pp. 6404–6409. https://doi.org/10.23919/CCC55666.2022.9902631
  15. Quiñonez Y., Lizarraga C., Aguayo R. Machine learning solutions with MediaPipe. Proc. of the 11th International Conference on Software Process Improvement (CIMPS), 2022, pp. 212–215. https://doi.org/10.1109/CIMPS57786.2022.10035706
  16. Ma J., Ma L., Ruan W., Chen H., Feng J. A Wushu posture recognition system based on MediaPipe. Proc. of the 2nd International Conference on Information Technology and Contemporary Sports (TCS), 2022, pp. 10–13. https://doi.org/10.1109/TCS56119.2022.9918744
  17. Cho K., Merriënboer B., Gulcehre C., Bahdanau D., Bougares F., Schwenk H., Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734. https://doi.org/10.3115/v1/D14-1179
  18. Dey R., Salem F.M. Gate-variants of Gated Recurrent Unit (GRU) neural networks. Proc. of the IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), 2017, pp. 1597–1600. https://doi.org/10.1109/MWSCAS.2017.8053243
  19. Kothadiya D., Bhatt C., Sapariya K., Patel K., Gil-González A.-B., Corchado J.M. Deepsign: Sign language detection and recognition using deep learning. Electronics, 2022, vol. 11, no. 11, pp. 1780. https://doi.org/10.3390/electronics11111780
  20. Verma U., Tyagi P., Kaur M. Single input single head CNN-GRU-LSTM architecture for recognition of human activities. Indonesian Journal of Electrical Engineering and Informatics (IJEEI), 2022, vol. 10, no. 2, pp. 410–420. https://doi.org/10.52549/ijeei.v10i2.3475


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика